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Abstract 

A sandwich likelihood correction is proposed to remedy an inferential limitation of the 
Bayesian quantile regression approach based on the misspecified asymmetric Laplace density, 
by leveraging the benefits of the approach. Supporting theoretical results and simulations are 
presented. 
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1 Introduction 


Quantile Regression is used to model conditional quantiles of the independently distributed responses 
given p— dimensional covariate vectors (XjlT- i • The frequ entist approach to modeling the 
r th quantile (0 < r < 1) as proposed bv lKoenker and Bassettl (119781) involves solving the problem: 


Pn = ar 9 min ^ p T ('.Yi - Xf (3 ), 
P i— 1 


(i) 


where p r (u) = u(r — I( u < o)) with /(.) being the indicator function. This procedure is equivalent 
to Maximum Likelihood Estimation (MLE) if the responses are assumed to follow the asymmetric 
Laplace distribution (ALD), whose probability density function (p.d.f) is given by 


= t(1 — t) exp {—p T (y — Pi)} , with pj = Xf/3 and y £ (- 00 , 00 ). (2) 
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It is easy to check that fxj is the r th quantile with respect to the p.d.f 


The focus of this paper is a widely used Bayesian approach to the problem proposed bv lYu and Moveed 


(120011) . where the Bayesian posterior is obtained by assuming the likelihood for Yi , and a 


prior II(-) for (3. The approach is computa tionally attractive espec ially due to the location scale 


mixture normal representation of ALD (see 


Kozumi and Kobavashil (12011)). and is know n to give 


posterior consistent estimates even if ALD is a misspecificatio n (see 


Sriram et al 


i - 7 - -- ~ 

Benoit and Van den Poel 

2012 

Alhamzawi and Yu 

2013; 

l| -- 1 1 IT 

Waldmann et al. 


2005 


20131) . 


2013). There- 


Yuc and Rue 


2011 


The aim of this paper is to highlight and remedy an inferential limitation of the Bayesian quantile 
regression approach based on ALD, when the ALD model is possibly misspecified. From a classical 
Bayesian point of view, where the objective is to draw inference on the “true” underlying quantile 
regression parameters (/3 0 ), it is desirable that the Bayesian inference coincides with the frequentist 
inference as the size of data increases. For this, two “frequentist” asymptotic properties need to hold, 
(i) Posterior consistency, i.e. the posterior asymptotically concentrates around the “true” parameter 
value, and (ii) “Coverage property”, i.e. the 100(1 — a)% Bayesian credible sets asymptotically merge 
with the 100(1 — a)% frequentist confidence sets around (3^ ■ Even from a su bjectivist Bayesian po int 


of view, which does not subscribe to the idea of a “true parameter” value, 


Diaconis and Freedman 


(119861) argue that violation of posterior consistency is undesirable. Such violation would mean 


that two experts starting with different priors, can completely diverge on their opinions about the 
predictive distributions, even as more data becomes available. Further, any small sample posterior 
inference based on a possibly misspecified likelihood is not straight forward to justify. In such cases, 
checking the large sample (asymptotic) properties is a way to avoid using an inappropriate likelihood. 

The ALD is mainly used as a “working likelihood” for Bayesian quantile inference. It is more 
often than not a misspecification of the true u nde rlying likelihood. While posterior consistency still 


holds under suitable conditions (see 


Sriram et al 


20131) . we find that the “coverage property” may 


not hold. An undesirable consequence would be that a narrow Bayesian credible interval could give 
a false sense of certainty about a parameter, which is an artifact of a misspecified likelihood rather 
than an actual gain from the Bayesian approach. This paper proposes a sandwich likelihood method 
to remedy this issue, while still leveraging the benefits of using ALD. 

Section [2] describes the sandwich likelihood method. Supporting theoretical results and simula¬ 
tions are presented in Sections [3] and 0] respectively. 
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2 The Sandwich Likelihood Method 


Let r e (0,1) be fixed. Suppose {Y), * = 1,2,..., n} are independent but non-identically distributed 
( i.n.i.d ) with probability density function Y) ~ pj(-). Let the “true” r th quantile of Y) be given by 
Q r (Xi) = Xf (3 0 , where X* is a vector of p—dimensional non-random covariates. We model the r th 
quantile as Q r (Xj) = X.J f3 along with a proper prior II (with p.d.f tt) for (3. We write the posterior 
distribution of /3 given the data Y\, Y%,..., Y n as 


II„(/3 € B) 


I s 


, where f^ n) 


n fop*)- 

i=1 


( 3 ) 


The sandwich likelihood method is motivated from 


Mullen (2013ft and is based on two observations. 


(a) As seen later in Theorem[Q the posterior distribution II n (-) of (3 is asymptotically equivalent 
to a normal distribution centred at (3^ (as in equation®) and covariance matrix , where 


V = lim — 

\ n— > oo n 


E 

i =1 


P*(xf/3 0 )X;X^ 


( 4 ) 


(b) It follows from 


Koenker 


2005 


(page 74) that / 3^ is asymptotically normal with mean j3 0 


and (“sandwich”) covariance matrix 4Y, where 

l n 

E = r(l - ^V-'SV- 1 with S = lim S n , S n = - V X f xT. 

n .—Yon n * ^ 


( 5 ) 


Since V~ x ^ E in general, (a) and (b) imply that the credible intervals from II„(-) will not asymp¬ 
totically match the normal frequentist confidence intervals, thus violating the “coverage property” 
described in the introduction. 

Let f3 n and ^lA -1 be the mean and covariance matrix for (3 under the posterior distribution 
II n (-). The “Sandwich likelihood” method to remedy this issue can be described in two steps. 

Step l.(ALD step) Using the posterior distribution ![„(•) purely based on the ALD likelihood 
Y ~ /i /3 and prior II(-), compute the posterior mean = f3 n , and covariance matrix = L-V^ -1 . 
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Step 2. (Sandwich Likelihood step) Define the “Sandwich likelihood” as 


( n ) 

9/3 


3 -n(/3„-/3) T E- 1 (/3„- i 8) 


|£„/n|a 

where E ra = r(l — r)V~ 1 S n V r 


-1 
n i 


(6) 

(7) 


and recompute a new posterior distribution for (3 as follows: 


5b 9 { ^ ) AP)dP 


U prop {p G B) = 


/^d gft’APWP 


( 8 ) 


Theorem [2] in Section [3] ensures that the credible sets based on the new posterior from Step 2 merge 
asymptotically with the frequentist confidence sets. Here, we make a few remarks. 

Remark 1. Note that the proposed sandwich likelihood is based on the Bayesian posterior mean /3 n 
from the ALD step (i.e. Step 1). We denote this propose d approach by “SLBA”. An alternative 


approach to sandwich likelihood in Step 2 is in the lines of iMullcr (12013 1. got by using the classical 
quantile regression estimator p 1 ^ instead of (3 n in equation ©. We will denote this alternative 
by “SLQR”. While both these methods are asymptotically equivalent (see Lemma 2a), and work 
similarly for relatively flat priors or large sample sizes, simulations in Section |4] suggest that the 
proposed SLBA method may be more suited for small sample sizes with informative priors. 

Remark 2. The Mar kov Chain Monte Ca rlo (MCMC) approaches for implementing Step 1 are now 


well known (e.g. see 


Yuc and Rue 


20111) . Therefore, P n and —V n 1 can be obtained as the mean 


and covariance matrix computed based on the MCMC simulations of P from Step 1. Consequently, 
E n (as in equation 0 needed for Step 2 can be easily computed. 

Remark 3. Under the assumptions made in Section[3l Lemma0shows that P n and V~ l are consistent 
for P Q and V~ x respectively. Consequently, E„ is consistent for E. It is worth noting that estimatio n 


of the E in the i.n.i.d. case is in general a challenging problem (see section 3.4.2 of 


Koenker 


2005) 


since the true underlying densities {Pi{-)}i> l are not known. This method is a simple alternative. 


3 Theoretical Results 

Recall that the true underlying p.d.f of L) is pi. By way of notation, let P^ denote the product 
probability p\ x p 2 x • • • p n and P denote the infinite product probability Let P(-) and 
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E [•] denote the probability and expectation with respect to the true product probability. We define 
Zi'=Yi — Xf/3 0 and note that P(Zi < 01X,) = r. 


Our Assumptions la, 2 to 4 are exactly those in ISriram et al.l (1201311 for posterior consistency. 


Assumption lb is additionally introduced to help ensure consistency of posterior mean and variance. 

Assumption 1. 

(a) II is a proper prior with a bounded and continuous p.d.f 7 r, with 7 t(/ 3 0 ) > 0. 


0) / II/3II 2 7r(/3)d/3 < 00. 


The second assumption requires that the covariates be bounded. 

Assumption 2. 3 M > 0 such that ||Xj|| < M, V i. 

The first part of the next assumption essentially says that the non-intercept covariates (after ap¬ 
propriate centering) take values in all quadrants of the Euclidean plane. In particular, this implies 
that they cannot be collinear. The second part of the assumption requires that the true underlying 
likelihood put positive mass around the true quantile, in particular ensuring that it is unique. 

Assumption 3. 

(a) Let the first coordinate of X, be identically 1 representing the intercept. After appropriate 

centering of the other co-ordinates, 3 eg > 0 such that liminfn^oo — eD > 0, V 

D C 1Z P of the form {1} x U 2 x • • • U p , where for some j, Uj is either = (e 0 , 00 ) or (— 00 , — eo) 
and for k ^ j, Uk is either ( 0 ,oo) or (— 00 , 0 ). 

(b) For some C > 0 and all sufficiently small A > 0, P(0 < Zi < A) > CA and P(—A < Zi < 
0)>CAV i. 

Assumption 4 is a technical condition required for Strong Law of Large Numbers in the i.n.i.d, case. 



Lemma 1. Let Assumptions 1 to 4 hold, and k € {0,1, 2}. Then, for any sequence M n —> 00 , 
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F or k = 0, Lemm a [T| gives posterior consistency at y/n—rate (the same conclusion as in Theorem 2b 


of 


Sriram et al. 


(20131) '). The results for k = 1 and k = 2 are useful in establishing the next lemma, 
which formalizes the fact that the classical estimator {(3^) and the posterior mean from Step 1 (/3„) 
are asymptotically equivalent, and that the estimators (lA -1 ^,,) are consistent for (V _1 ,E). 

Lemma 2. Under Assumption 1 to 4, 


(«) 

MPn - K) 

(b) 

V - 1 p-\ 


and hence E„ — > E in probability [P], 


The asymptotic normality of the posterior is derived by applying the Bernstein-von-Mises theo¬ 


rem for misspecified models given by 


Kleijn and van dcr Vaartl (120121 ). For that, a key requirement 


apart from yfn— posterior consistency, is “Local Asymptotic Normality”(LAN). For the ALD model 
on i.n.i.d data, the LAN property can be shown by making the following assumption on the bound¬ 
edness and continuity of the true underlying densities ft. 

Assumption 5. 

For some C, p > 0 and for all (3 in some small enough neighborhood of (3 0 , the p.d.fs ft satisfy: 

(a) {pi(X.J(3 0 ), i > 1} are uniformly bounded away from oo. 

(b) |ft(Xf/3)-ft(Xf/3 0 )| < C |Xf (3 — Xf (3 0 \ v Vi 

Lemma 3 (LAN property). Under Assumptions 2 and 5, the LAN property holds, i.e., for any 
compact set K C 1Z P , 


sup 

Sgk 


f. 


(ra) 


log 


Jn) 

J Po 


- 6 t V A„ i/3o - ls T VS 


0 in probability [P]. 


1 n 

where ,A„^ o = -V~ x — ^(r - / (r .< X T /3o) )X i . 

»=i 


(9) 

( 10 ) 


Proo fs of Lem mas an d [3] are included in the Appendi x and are essentially extensions of ideas 


from 


Sriram et al. 


( 20131) . 


Klciin and van der Vaart 


( 20121) and 


Koenkerl (120051) respectively. The 


next lemma establishes the asymptotic connection between the posterior probability and the nor¬ 
mal distribution. Let $(5, p, S) denote the probability of a set B under the multivariate normal 
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distribution with mean pi and covariance matrix £. Then, we have the following result. 
Lemma 4. Under Assumptions 1 to 5, 


(a) sup |n„ (\/n(/3 — f3 0 ) G P) — $ (P, A Uy p o , V *) | —> 0 in probability [P]. (11) 

B 

(b) Vn(fln - Pa) - A n,p 0 -t 0 in probability [P]. (12) 


Proof. Part(a) is an immediate consequence of Lemma [I] ( for k = 0) and Lemma 


precisely th e conditions ne eded to apply Theorem 2.1 of 
is shown in iKoenkcrl (120051 ') (see page 122, equation 4.4). 


Klciin and van der VaartJ (2012). Part (b) 


since these are 


H sine 
t (2012 


□ 


Our first theorem is immediate from Lemma |4] and formalizes Step 1. 
Theorem 1. Let Tl n be as in equations Then, under assumptions 1 to 5, 


sup 

B 


It, 


(\/n(P — (3 0 ) € P) — 4> ^P, y/n{(3^ — /3 0 ), V ^ -> 0 in probability [P], 


The next theorem formalizes Step 2 of the proposed Sandwich Likelihood approach. 
Theorem 2. Let H n and Yiff op be as in equations m and Then, under assumptions 1 to 5, 


(a) sup n„ (\/n(/3 — (3 0 ) £ B) — $ (s, \Zn((3 n — /3 0 ), V ^ —> 0 in probability [P], 

n p n rop {yfc{p - (3 0 ) e b) - <& (b, V^(P n - do), s) | 0 in probability [P]. 


(6) sup 

B 


Proof. Part (a) follows from Theorem Q] and Lemma [2] (a). To see part (b), first note that —> E 
in probability [P] (by Lemma[2j. Since y/n(f3^ — /3 0 ) is asymptotically normal (Koenken (|2005h l. 
it is bounded in probability. Then, by Lemma [2] (a), y/n(f3 n — /3 0 ) is also bounded in probability. 
Hence, satisfies LAN property because 


sup 

<5e/c 


log 


( n) 




(n) 

5/3 0 


+ 6 1 T,~ 1 y/n(f3 n - (3 0 ) - -8 E _1 <5 


= sup 
seic 


^(E- 1 - E“ 1 )v / n(/3„ - ( 3 0 ) - 2^ T ( E_1 " 


—> 0 in probability [P]. 


7 





















The result is immediate from Theorem 2.1 of lKleiin and van der Vaartl ( 201211 . provided we show 


E [n prop (\fn\\(3 — /3 0 ll > M n )\ —> 0 for any sequence M n —> oo. 


(13) 


Let B n = {(3 : ^/n\\(3 — /3 0 1| > M n }. Making a change of variable \fn{(3 — (3 n ) = t , we can write 


n p n rop (B n ) = 


In 


nd V y/2Tt-det(Y.) 


_ - T 

2 'If \\a 


(\\p„+t/^-f3 0 \\>M n ) ‘ + V yfn)dt 


k 


l yj 27 r-det(J2) 


e * k ■ 7r(/3 n + t/y/n)dt 


Since f3 n —> (3 0 in probability [P] , M n —»■ oo, and 7r(-) is bounded and continuous by Assumption 
1, an application of Skorohod representation theorem for the sequence {/3„} n >i along with the 
dominated convergence theorem implies that the numerator of the above expression converges to zero 
and the denominator to 7 t(/ 3 0 ). This in turn implies that E [n prop (\/n\\/3 — /3 0 || > M n )\ —> 0. □ 

Remark 4 (On using ALD with a scale parameter). In some applications, one may carry out Bayesian 
quantile regresssion using an ALD that includes a scale parameter (er > 0), whose p.d.f is given by 


fip,a(y) — 


t(1-t) 


■ exp < - 


Pt{v - pj) 


, with = Xf (3 and y e (—oo, oo). (14) 


Here, either the scale is fixed (e.g. a = 1) or endowed with a prior. In the latter case, the scale 
is essentially a nuisance parameter for Bayesian inference on (3. The proposed approach can be 
easily modified to incorporate the scale parameter. Suppose the scale is fixed at a = <tq. Then 
under Assumptions 1 to 5, it is easy to check that the conclusions of Theorem [Ha) will hold with 
V~ x replaced by opY -1 . Accordingly, in Step 1 can be obtained as A- times the estimated 

covariance matrix under n„. Step 2 would remain the same. 

Suppose a is endowed with a prior on a compact interval [oq, og]. Then, 


Srirametal 


( 2013 )show 


under suitable conditions, that the posterior distribution of a given Yf, Y 2 ,..., Y n would concentrate 
around a value <jq given by 


cto = arg max log 

o-e[o-i,tT 2 ] 


r (l - r) 


C* 

a 


(15) 


= lim -V^T-W) and Zj = Y, - Xi/3 0 . 

m—> oo 777, Lv — ' 

i—1 


where, 


C 


















In this case, it is reasonable to expect that under suitable conditions, conclusions of Theorem Ufa) 
will hold with V~ x replaced by a only now with <jq as in equation (fl5l) . It appears that a 
fo rmal derivation of this res u lt req uires the Bernstein-von-Mises theorem for misspecified models (as 


Kleiin and van der Vaart 


201^) to be developed in the presence of a nuisance parameter. Hence, 


a formal investigation is deferred to a future work. Here, one could first estimate cro from Step 1 
using the MCMC simulations of the parameter a (e.g. posterior mean do), and then obtain -V~ l 


as J- times the covariance matrix under n„. 

o-o n 


Table 1: Comparison of methods when data size is large (N=2000). The numbers within parenthesis 
are coverage and interval length (COV in %, LEN). 


(a): N=2000, relatively flat prior on (3, Fixed ALD scale parameter at =1. 




a 

a 

02 

Model 

T 

QR. 

ALD 

SLQR 

SLBA 

QR. 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

1 

0.25 

(94,0.43) 

(98,0.55) 

(95,0.42) 

(96,0.42) 

(94,0.13) 

(98,0.17) 

(94,0.13) 

(95,0.13) 

(94,0.26) 

(98,0.34) 

(93,0.26) 

(92,0.26) 

2 

0.25 

(92,0.18) 

(100,0.36) 

(92,0.18) 

(94,0.18) 

(94,0.06) 

(100,0.11) 

(94,0.06) 

(95,0.06) 

(94,0.11) 

(100,0.22) 

(94,0.11) 

(96,0.11) 

3 

0.25 

(96,2.54) 

(72,1.31) 

(96,2.52) 

(95,2.52) 

(98,0.92) 

(64,0.44) 

(94,0.92) 

(94,0.92) 

(96,2.17) 

(68,0.96) 

(96,2.15) 

(96,2.15) 

4 

0.25 

(95,2.76) 

(72,1.36) 

(95,2.72) 

(95,2.72) 

(95,1) 

(68,0.46) 

(96,0.99) 

(97,0.99) 

(94,2.4) 

(57,1) 

(92,2.33) 

(92,2.33) 

1 

0.75 

(94,0.42) 

(98,0.53) 

(94,0.41) 

(94,0.41) 

(94,0.13) 

(98,0.16) 

(92,0.13) 

(94,0.13) 

(94,0.26) 

(100,0.34) 

(93,0.26) 

(94,0.26) 

2 

0.75 

(94,0.52) 

(98,0.6) 

(91,0.51) 

(92,0.51) 

(93,0.16) 

(96,0.18) 

(92,0.16) 

(92,0.16) 

(96,0.33) 

(98,0.38) 

(94,0.33) 

(96,0.33) 

3 

0.75 

(95,1.75) 

(83,1.07) 

(94,1.72) 

(94,1.72) 

(97,0.64) 

(78,0.36) 

(96,0.63) 

(96,0.63) 

(94,1.51) 

(70,0.79) 

(92,1.48) 

(92,1.48) 

4 

0.75 

(94,2.68) 

(66,1.31) 

(94,2.62) 

(92,2.62) 

(96,0.97) 

(64,0.44) 

(94,0.95) 

(94,0.95) 

(96,2.27) 

(68,0.96) 

(94,2.19) 

(94,2.19) 


(b): N=2000, relatively flat prior on /3, Inverse gamma prior on ALD scale parameter. 




a 

A 

A 

Model 

r 

QR 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

QR. 

ALD 

SLQR 

SLBA 

1 

0.25 

(94,0.41) 

(86,0.3) 

(95,0.41) 

(94,0.41) 

(96,0.13) 

(86,0.09) 

(96,0.12) 

(95,0.12) 

(98,0.26) 

(83,0.19) 

(96,0.26) 

(96,0.26) 

2 

0.25 

(96,0.18) 

(96,0.16) 

(96,0.18) 

(97,0.18) 

(97,0.05) 

(96,0.05) 

(96,0.05) 

(97,0.05) 

(91,0.11) 

(90,0.1) 

(91,0.11) 

(92,0.11) 

3 

0.25 

(96,2.44) 

(96,2.17) 

(96,2.4) 

(98,2.4) 

(96,0.88) 

(92,0.72) 

(96,0.87) 

(97,0.87) 

(93,2.11) 

(88,1.62) 

(92,2.08) 

(94,2.08) 

4 

0.25 

(94,2.71) 

(86,2.11) 

(91,2.63) 

(94,2.63) 

(94,0.98) 

(86,0.7) 

(92,0.96) 

(92,0.96) 

(95,2.36) 

(80,1.59) 

(92,2.32) 

(92,2.32) 

1 

0.75 

(95,0.42) 

(84,0.3) 

(93,0.41) 

(94,0.41) 

(96,0.13) 

(82,0.09) 

(94,0.13) 

(94,0.13) 

(95,0.27) 

(84,0.19) 

(96,0.27) 

(96,0.27) 

2 

0.75 

(96,0.53) 

(80,0.35) 

(94,0.52) 

(94,0.52) 

(96,0.16) 

(80,0.11) 

(94,0.16) 

(93,0.16) 

(95,0.34) 

(80,0.22) 

(92,0.33) 

(94,0.33) 

3 

0.75 

(92,1.68) 

(80,1.25) 

(90,1.65) 

(90,1.65) 

(89,0.62) 

(76,0.42) 

(89,0.61) 

(88,0.61) 

(94,1.54) 

(78,0.97) 

(93,1.5) 

(94,1.5) 

4 

0.75 

(94,2.63) 

(91,2.07) 

(94,2.58) 

(94,2.58) 

(96,0.98) 

(84,0.7) 

(94,0.96) 

(94,0.96) 

(92,2.36) 

(80,1.59) 

(90,2.33) 

(90,2.33) 



( C ) 

N=2000, Informative 

prior on 

/3 , Fixed ALD scale parameter at =1. 





a 

A 

02 

Model 

T 

QR. 

ALD 

SLQR 

SLBA 

QR. 

ALD 

SLQR 

SLBA 

QR. 

ALD 

SLQR 

SLBA 

1 

0.25 

(96,0.42) 

(100,0.53) 

(98,0.41) 

(96,0.41) 

(96,0.13) 

(100,0.16) 

(96,0.13) 

(96,0.13) 

(96,0.26) 

(100,0.34) 

(95,0.26) 

(96,0.26) 

2 

0.25 

(94,0.18) 

(100,0.35) 

(95,0.17) 

(98,0.17) 

(96,0.06) 

(100,0.11) 

(96,0.05) 

(96,0.05) 

(95,0.11) 

(100,0.22) 

(96,0.11) 

(95,0.11) 

3 

0.25 

(98,2.43) 

(78,1.19) 

(98,1.82) 

(99,1.82) 

(96,0.88) 

(69,0.4) 

(96,0.69) 

(97,0.69) 

(96,2.12) 

(68,0.91) 

(96,1.72) 

(96,1.72) 

4 

0.25 

(96,2.68) 

(76,1.24) 

(96,1.94) 

(98,1.94) 

(97,0.98) 

(72,0.42) 

(95,0.74) 

(96,0.74) 

(94,2.36) 

(68,0.95) 

(92,1.82) 

(96,1.82) 

1 

0.75 

(94,0.42) 

(98,0.53) 

(93,0.4) 

(95,0.4) 

(94,0.13) 

(98,0.16) 

(92,0.12) 

(94,0.12) 

(94,0.26) 

(100,0.34) 

(93,0.26) 

(93,0.26) 

2 

0.75 

(94,0.52) 

(98,0.59) 

(90,0.49) 

(92,0.49) 

(93,0.16) 

(96,0.18) 

(91,0.15) 

(94,0.15) 

(96,0.33) 

(98,0.38) 

(95,0.33) 

(97,0.33) 

3 

0.75 

(95,1.75) 

(84,1.02) 

(94,1.45) 

(95,1.45) 

(97,0.64) 

(78,0.34) 

(96,0.54) 

(96,0.54) 

(94,1.51) 

(72,0.78) 

(94,1.33) 

(94,1.33) 

4 

0.75 

(94,2.68) 

(70,1.24) 

(94,1.93) 

(96,1.93) 

(96,0.97) 

(66,0.41) 

(93,0.74) 

(95,0.74) 

(96,2.27) 

(68,0.93) 

(96,1.78) 

(96,1.78) 

(d): N=2000, Informative prior on /3, Inverse gamma prior on ALD scale parameter. 



a 

A 

A 

Model 

r 

QR 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

QR. 

ALD 

SLQR 

SLBA 

1 

0.25 

(96,0.42) 

(86,0.29) 

(95,0.4) 

(96,0.4) 

(96,0.13) 

(88,0.09) 

(94,0.12) 

(96,0.12) 

(94,0.27) 

(86,0.19) 

(94,0.26) 

(94,0.26) 

2 

0.25 

(96,0.17) 

(95,0.16) 

(96,0.17) 

(94,0.17) 

(94,0.05) 

(92,0.05) 

(94,0.05) 

(93,0.05) 

(94,0.12) 

(94,0.1) 

(94,0.11) 

(94,0.11) 

3 

0.25 

(94,2.39) 

(96,1.85) 

(88,1.59) 

(97,1.59) 

(95,0.88) 

(94,0.63) 

(90,0.62) 

(96,0.62) 

(96,2.2) 

(94,1.53) 

(94,1.64) 

(98,1.64) 

4 

0.25 

(94,2.71) 

(91,1.84) 

(88,1.76) 

(98,1.76) 

(92,0.98) 

(86,0.62) 

(90,0.68) 

(96,0.68) 

(95,2.44) 

(85,1.51) 

(93,1.81) 

(96,1.81) 

1 

0.75 

(95,0.42) 

(88,0.3) 

(94,0.41) 

(94,0.41) 

(96,0.13) 

(87,0.09) 

(96,0.13) 

(96,0.13) 

(92,0.27) 

(86,0.19) 

(92,0.26) 

(92,0.26) 

2 

0.75 

(96,0.54) 

(82,0.36) 

(96,0.52) 

(96,0.52) 

(96,0.16) 

(84,0.11) 

(97,0.16) 

(97,0.16) 

(96,0.34) 

(84,0.22) 

(96,0.34) 

(96,0.34) 

3 

0.75 

(98,1.73) 

(91,1.22) 

(94,1.4) 

(98,1.4) 

(96,0.63) 

(84,0.41) 

(92,0.52) 

(96,0.52) 

(94,1.56) 

(78,0.95) 

(90,1.33) 

(92,1.33) 

4 

0.75 

(94,2.73) 

(94,1.86) 

(89,1.75) 

(96,1.75) 

(94,0.99) 

(88,0.63) 

(89,0.69) 

(98,0.69) 

(94,2.38) 

(84,1.47) 

(94,1.74) 

(98,1.74) 
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4 Simulation Study 

Here, we study the performance of the proposed sandwich likelihood method (SLBA) by simulating 
data from different “true” underlying models. We work with two covariates X\ and X 2 , simulated 
from N( 3,1) (truncated between 1 and 1000) and Bernoulli^ 0.3) respectively. For each simulated 
model, we ensure that the true r t?i -quantile is given by qy(X) = (1 + 2Xi + 3 X 2 ). The simulated 
models are described below. The first two models are a location shifted normal and a location shifted 
gamma respectively, the third is a scaled gamma, and the fourth model a location shifted and scaled 
normal. 

Model 1. Y = qy(X) + e, where e = Z — p T , Z ~ N(0, 1) and p T = r th quantile of N( 0,1). 

Model 2. Y = 1 + 2Xi + 3 X 2 — p T + e, where e ~ Gamma(shape = 1, scale = 1), p T is the 
r th quantile of Gamma{ 1,1) 

Model 3. Y ~ Gamma ^shape = 2, scale = q ^ , where p T = r th quantile of Gamma( 2,1). 

Model 4. Y ~ 7V(1 + 2Xi + 3X 2 - p T |1 + 2Xi + 3X 2 |, |1 + 2Xi + 3X 2 | 2 ), where p r is the r th 
quantile of 7V(0,1). 

The specified model for the r th quantile is Q T (X) = a+^iXi+/3 2 X 2 . We present results for different 
quantiles (r £ {0.25, .75}), different sample sizes (N £ {50,2000}) and analyze with respect to a 
relatively flat prior (i.e. a product of three N(0,100) distributions), as well as an informative prior 
for the parameters (ck,/3i,/ 3 2 ) (he., a product of N(.9,l), N(2.1,l) and N(2.9,l)). In addition, we 
present two scenarios for the scale parameter of ALD, (i)er = 1 and (ii) a ~ Gamma prior with 
mean=100 and variance=1000. The conclusions reached were similar for other scenarios involving 
values of r £ {0.05, .5, .95}, a £ {1/2,2}, and N £ {100,500}. However, these scenarios are not 
shown here in the interest of conciseness. 

Recall that the proposed sandwich likelihood (denoted SLBA) is based on the posterior mean 
/3„ from Step 1. We compare this with the alternative method of using the classical estimator (3^ 
(denoted by SLQR). We also compare with the frequentist quantile regression (denoted QR) and 
the Bayesian approach based purely on ALD, which is same as Step 1 (denoted ALD). For the QR 
method, we use the bootstrap method for computing confidence intervals and for the other methods 
we compute the posterior Bayesian credible intervals using 1000 MCMC simulations after a burn-in 
of 2000 simulations. The methods are compared with respect to the coverage property of the 95% 
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confidence/credible interval (denoted by COV), and the length of the confidence interval (denoted 
by LEN). The coverage (COV) is computed by repeating the above simulation 200 times and then 
calculating the percentage of times the 95% confidence/credible intervals contain the true value. 
Similarly, the length (LEN) is computed as the average interval length across the 200 repetitions. 

Table [1] compares the methods across different scenarios when the sample size is large (N = 2000), 
and Table [2] does the comparison when sample size is small (V = 50). The sub-tables (a) and (b) 
are for a relatively flat prior on /3, and sub-tables (c) and (d) are for an informative prior. Further, 
sub-table (a) and (c) fix the ALD scale parameter at 1, whereas sub-tables (b) and (d) carry out 
the analysis by considering a gamma prior on the ALD scale parameter. 

We can make the following observations from Tabic [T] For large V(=2000), it is desirable that 
Bayesian and the classical inferences merge asymptotically. Here, as expected from our results, 
the coverages and length of intervals from both the sandwich likelihood methods viz., the proposed 
SLBA and the alternative SLQR, as well as the classical QR methods are close to each other. SLBA 
is closer to SLQR for relatively flat priors, and performs slightly better with an informative prior. 
However, compared to these methods, the coverages from the ALD method in Tables 1(a) and (c) 
(i.e. scale=l) are not as close to 95%. For the simpler models 1 and 2 with i.i.d. errors, coverages 
under ALD are consistently but slightly higher than 95%. For more complex models 3 and 4 with 
i.n.i.d errors, coverages are way below 95%. So, for more complex models, the inadequacy of coverage 
under ALD is more pronounced. Such an issue with ALD can be partly addressed by allowing some 
flexibility in the ALD scale parameter. This is checked in Tables 1(b) and 1(d), where a prior is 
assumed on a and the coverages from the ALD method are seen to improve for models 3 and 4. 
Since ALD is still a misspecificaton, the issue does not fully go away ( e.g. as seen in the coverages 
for Pi in Table 1(d) for r = .75). Finally, since the sample size is large, the results are similar for a 
relatively flat prior and for an informative prior. In summary, the simulation results in Table [I] are 
supportive of the asymptotic results in Section [3] They also highlight the fact that using Bayesian 
inference based purely on ALD could be more misleading especially when the true data generating 
likelihood is complex. 

To further help demonstrate the usefulness of SLBA, Table [2] shows the simulation results for 
a small sample size N = 50. When the prior is relatively flat, as in sub-tables (a) and (b), the 
observations made in the previous paragraph still more or less hold. However, when we use an 
informative prior, as in sub tables (c) and (d), differences start to show. Unlike sub tables (a) 
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Table 2: Comparison of methods when data size is small (N=50). The numbers within parenthesis 
are coverage and interval length (COV in %, LEN). 


(a): N=50, relatively flat prior on (3, Fixed ALD scale parameter at =1. 




Q! 

Pi 

ft 

Model 

r 

QR 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

1 

0.25 

(92,2.78) 

(100,3.53) 

(95,2.91) 

(98,2.91) 

(94,0.88) 

(100,1.12) 

(96,0.93) 

(97,0.93) 

(94,1.92) 

(100,2.43) 

(96,2.06) 

(98,2.06) 

2 

0.25 

(97,1.26) 

(100,2.72) 

(100,1.72) 

(100,1.72) 

(98,0.4) 

(100,0.87) 

(100,0.56) 

(100,0.56) 

(96,0.88) 

(100,1.88) 

(100,1.23) 

(100,1.23) 

3 

0.25 

(96,16.5) 

(75,7.37) 

(94,12.59) 

(95,12.59) 

(97,6.11) 

(71,2.55) 

(92,4.81) 

(94,4.81) 

(96,14.66) 

(67,5.92) 

(90,11.98) 

(92,11.98) 

4 

0.25 

(96,18.05) 

(78,7.7) 

(92,13.64) 

(94,13.64) 

(96,6.72) 

(68,2.68) 

(92,5.29) 

(92,5.29) 

(96,15.78) 

(66,6.23) 

(91,13.09) 

(93,13.09) 

1 

0.75 

(94,2.67) 

(100,3.32) 

(95,2.73) 

(96,2.73) 

(96,0.78) 

(100,0.98) 

(96,0.82) 

(98,0.82) 

(94,1.68) 

(100,2.11) 

(96,1.74) 

(99,1.74) 

2 

0.75 

(98,3.28) 

(100,3.55) 

(97,3.17) 

(99,3.17) 

(97,0.94) 

(100,1.03) 

(98,0.92) 

(99,0.92) 

(95,2.19) 

(99,2.34) 

(96,2.17) 

(98,2.17) 

3 

0.75 

(96,12.84) 

(82,6.57) 

(94,11.05) 

(95,11.05) 

(94,4.43) 

(74,2.1) 

(91,3.88) 

(92,3.88) 

(98,9.17) 

(72,4.56) 

(96,8.24) 

(95,8.24) 

4 

0.75 

(98,20.55) 

(74,8.13) 

(95,15.6) 

(96,15.6) 

(96,6.97) 

(67,2.59) 

(90,5.44) 

(91,5.44) 

(98,14.16) 

(64,5.55) 

(92,11.71) 

(95,11.71) 


(b): N= 

50, relatively flat prior on f3, Inverse gamma prior on 

ALD scale parameter. 




a 

ft 

ft 

Model 

T 

QR 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

1 

0.25 

(97,3.37) 

(80,2.01) 

(93,3.25) 

(92,3.25) 

(96,1.05) 

(80,0.62) 

(92,1) 

(92,1) 

(97,1.82) 

(80,1.09) 

(95,1.72) 

(96,1.72) 

2 

0.25 

(98,1.49) 

(94,1.12) 

(96,1.44) 

(98,1.44) 

(99,0.47) 

(96,0.35) 

(97,0.45) 

(97,0.45) 

(96,0.84) 

(92,0.61) 

(93,0.77) 

(97,0.77) 

3 

0.25 

(98,21.96) 

(94,14.06) 

(93,15.35) 

(97,15.35) 

(98,7.43) 

(94,4.6) 

(93,5.34) 

(97,5.34) 

(96,14.71) 

(86,8.98) 

(92,11.86) 

(96,11.86) 

4 

0.25 

(98,22.92) 

(92,13.3) 

(96,16.2) 

(98,16.2) 

(98,7.74) 

(86,4.35) 

(94,5.67) 

(96,5.67) 

(94,15.44) 

(86,8.7) 

(93,12.94) 

(94,12.94) 

1 

0.75 

(94,3.01) 

(84,1.82) 

(91,2.9) 

(93,2.9) 

(94,0.94) 

(83,0.56) 

(90,0.92) 

(93,0.92) 

(96,2.16) 

(90,1.28) 

(96,2.08) 

(96,2.08) 

2 

0.75 

(94,3.63) 

(80,2.05) 

(89,3.5) 

(92,3.5) 

(93,1.11) 

(82,0.63) 

(91,1.08) 

(92,1.08) 

(94,2.64) 

(82,1.42) 

(90,2.44) 

(93,2.44) 

3 

0.75 

(93,12.81) 

(86,7.57) 

(92,10.97) 

(94,10.97) 

(94,4.67) 

(82,2.55) 

(92,4.06) 

(94,4.06) 

(93,13.49) 

(73,6.38) 

(90,10.93) 

(92,10.93) 

4 

0.75 

(96,19.67) 

(88,12.13) 

(91,15.08) 

(96,15.08) 

(92,7.01) 

(80,4.04) 

(90,5.47) 

(93,5.47) 

(94,20.65) 

(80,10.44) 

(91,15.54) 

(92,15.54) 


(c): N=50, Informative prior on (3 , Fixed ALD scale parameter at =1. 




a 

ft 

p2 

Model 

T 

QR 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

1 

0.25 

(96,2.69) 

(100,2.46) 

(84,1.38) 

(98,1.38) 

(96,0.79) 

(100,0.76) 

(86,0.48) 

(98,0.48) 

(96,1.7) 

(100,1.79) 

(94,1.19) 

(98,1.19) 

2 

0.25 

(96,1.18) 

(100,2.04) 

(98,0.99) 

(100,0.99) 

(96,0.35) 

(100,0.63) 

(98,0.33) 

(100,0.33) 

(95,0.75) 

(100,1.4) 

(98,0.74) 

(99,0.74) 

3 

0.25 

(98,18.89) 

(98,3.35) 

(36,2.28) 

(100,2.28) 

(96,6.39) 

(79,1.31) 

(74,1.61) 

(92,1.61) 

(96,12.68) 

(87,3.09) 

(66,2.74) 

(98,2.74) 

4 

0.25 

(96,19.72) 

(98,3.37) 

(30,2.3) 

(99,2.3) 

(92,6.71) 

(78,1.33) 

(72,1.69) 

(91,1.69) 

(96,13.81) 

(86,3.14) 

(64,2.79) 

(98,2.79) 

1 

0.75 

(94,2.67) 

(100,2.45) 

(78,1.38) 

(98,1.38) 

(96,0.78) 

(100,0.76) 

(84,0.48) 

(98,0.48) 

(94,1.68) 

(100,1.8) 

(90,1.2) 

(98,1.2) 

2 

0.75 

(98,3.28) 

(100,2.55) 

(82,1.48) 

(100,1.48) 

(97,0.94) 

(100,0.79) 

(86,0.52) 

(100,0.52) 

(95,2.19) 

(100,1.93) 

(91,1.37) 

(98,1.37) 

3 

0.75 

(96,12.84) 

(96,3.23) 

(44,2.17) 

(97,2.17) 

(94,4.43) 

(88,1.23) 

(76,1.41) 

(92,1.41) 

(98,9.17) 

(90,2.92) 

(72,2.57) 

(98,2.57) 

4 

0.75 

(98,20.55) 

(98,3.4) 

(34,2.32) 

(98,2.32) 

(96,6.97) 

(77,1.34) 

(72,1.69) 

(88,1.69) 

(98,14.16) 

(88,3.15) 

(65,2.79) 

(100,2.79) 


(d): N=50, Informative prior on /3, Inverse gamma prior on ALD scale parameter. 




a 

ft 

ft 

Model 

T 

QR 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

QR 

ALD 

SLQR 

SLBA 

1 

0.25 

(94,2.68) 

(86,1.48) 

(92,1.82) 

(95,1.82) 

(95,0.78) 

(84,0.43) 

(92,0.56) 

(94 

0.56) 

(96,1.71) 

(87,0.98) 

(94,1.36) 

(95,1.36) 

2 

0.25 

(98,1.15) 

(94,0.85) 

(95,0.97) 

(98,0.97) 

(98,0.34) 

(94,0.25) 

(96,0.29) 

(96 

0.29) 

(96,0.76) 

(93,0.55) 

(92,0.66) 

(96,0.66) 

3 

0.25 

(98,19.18) 

(100,3.6) 

(12,1.27) 

(98,1.27) 

(95,6.5) 

(98,1.67) 

(52,1.54) 

(94 

1.54) 

(98,13.5) 

(100,3.57) 

(28,1.87) 

(98,1.87) 

4 

0.25 

(98,19.58) 

(100,3.57) 

(17,1.44) 

(100,1.44) 

(96,6.7) 

(94,1.63) 

(54,1.62) 

(94 

1.62) 

(96,14.52) 

(100,3.52) 

(34,2.05) 

(98,2.05) 

1 

0.75 

(92,2.7) 

(81,1.5) 

(92,1.84) 

(92,1.84) 

(94,0.82) 

(82,0.46) 

(90,0.59) 

(96 

0.59) 

(96,1.77) 

(82,1.03) 

(94,1.41) 

(96,1.41) 

2 

0.75 

(94,3.34) 

(85,1.67) 

(93,2.02) 

(96,2.02) 

(95,1.02) 

(86,0.52) 

(94,0.68) 

(96 

0.68) 

(96,2.31) 

(84,1.2) 

(95,1.68) 

(96,1.68) 

3 

0.75 

(96,11.39) 

(100,3.27) 

(38,1.94) 

(100,1.94) 

(94,4) 

(89,1.27) 

(66,1.31) 

(92 

1.31) 

(94,10.29) 

(96,3.15) 

(68,2.46) 

(100,2.46) 

4 

0.75 

(96,18.73) 

(100,3.56) 

(16,1.42) 

(98,1.42) 

(95,6.66) 

(90,1.62) 

(51,1.61) 

(92 

1.61) 

(98,15.68) 

(100,3.58) 

(32,2.01) 

(99,2.01) 


and (b), where the lengths of credible intervals for the QR and SLBA methods are similar, the 
SLBA method has much smaller intervals than QR in sub-tables (c) and (d) , while still retaining 
coverages around 95%. This should not be surprising since the QR method is a classical approach 
and does not utilize the prior information, whereas the SLBA method utilizes the informative prior. 
In contrast, the SLQR method is seen to perform poorly. This is because SLQR leads to intervals 
of similar length as SLBA, but is centered at the classical estimator (/ 3„)■ Since (3^ does not use 
the informative prior, this centering can be inaccurate. The ALD method performs better with an 
informative prior, but SLBA continues to have better coverage. 

In summary, the proposed SLBA method works better than the SLQR, ALD and QR methods 
across all scenarios. Further, while SLBA and SLQR are asymptotically equivalent and perform 
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similarly in the case of large sample sizes or relatively flat priors, the SLBA method is seen to 
consistently work better and more pronouncedly so for small sample sizes and informative priors. 


Appendix 


Proof of Lemma 1 . The proof is by extending ideas from 


Srirametal 


( 2013 ). A sketch is pro¬ 


vided here. Let A ra = and k £ {0,1,2}. Following arguments leading up to Lemma 5 of their 
paper, a compact set G C {||/3 —/3 0 || < M 0 } exists such that for some u > 0 and sufficiently large n, 

(n) 


/ ( v ^||/3-/3 0 ||) fc % y dn(/3)<nie-™ / ||/3 - f3 0 \\ k ir((3)d(3. (16) 

■ / G° fS J G° 


By Assumption lb, the integral on the right hand side is finite, and hence goes to zero as n —> oo. 


C«) 


j 

Along with the fact that for any e > 0, e ne f -f^jdII(/3) —> oo a.s.[P ], it will then follow that 

fpo 


E 


(^||/3-/3 0 ||) fc dn„(/3) 


0 . 


(17) 


of 


Now, for any ei > 0 (to be chosen later), using arguments leading up to the proof of Theorem 1 


Srirametal 


(120131) . there exists 0 < d < 1 and some positive constants C[,C 2 such that 


E / (v^||/3-/3 0 ||) fc dn„(/3) 

Gn{||/3-/3 0 ||>e 1 } 

< MqE f(n n (G n { ||/3 - /3 0 1| > ei») d l < C'n? M^e~ n ^ 


<n*M«E\ n„ (Gn{||/3-/3 0 || > d})] 


Since the right hand side of the above inequality converges to zero asn-> oo, we get 


E , 

' /Gn{||/3-/3 0 ||> ei } 

By Assumption 2 and Lemma 1 of ! 


(a/^II/3 — f3 0 \\) k dU n ((3) 


0 . 


(18) 


Sriram et al 


(12013 ). there exists a constant C 3 > 0 such that 


V /3 : H/3 — /3 0 1| < ei := —we have 

403 


1 fp 

log ~r~ 

J/3o 


<G 3 ||/3 —/3 0 || <-. 
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Since for t < 1/2, e* — 1 < < 2f, and using the fact that E 


lo sj£- 


> 0 , we have 


V/3e{||/3-/3„|| < er}, E 


fp 

fpo. 


<1 + 2 E 


log 


fp 

fpo _ 


<e - 2 E | 1 og^ 


We can write {A n < \\(3 — (3 Q 
We note using Lemma 2(c) of 
for some constant C 4 > 0 , 


< eil C where A jn = {/3 : j A n < \\f3 - /3 0 || < (j + 1)A„}. 


Srirametal 


(120131 ) and Assumption 3b that for sufficiently large n, 


J2 e 

i= 1 


1 fip 0 

log-T^ 
Jip . 


77 + A ^ 

> v P e 


It follows that 


£ 


/ (\/n||/3 - /3 0 ||) fe fl 

/A„<||/3-/3 0 ||<£i i= i foPa 


< 51 / (^ll/ 3 -^ 


2=1 

o - 2 E” = 1 sflog^A» 


7rG3)rf/3 


l>i' 


< (^A 2 ) 5 + 1) 5 e C4 " j n(A jn ) 


J >1 


< C 5 • (nA 2 )^e- C4 ^ • A^(i + l)#+ d e - C4 

j>i 


n( 3 ^-l)A^ 


The last expression uses the fact that (3 £ 5ft p and hence n(A,- ra ) is less than a constant multiple 
(say C 5 ) of ((j + 1)A£. It is easy to see that the summation in the above expression is bounded for 
all n. Hence, for some constant Cq and sufficiently large n, we will have 


E 


[ (yfrWP-M ) k f[^-n((3)d(3 

/A„<|| / 3-/3 0 ||<ei i=1 HPo 


<C 6 -(nA 2 n )?e~ c ^ -A*. (19) 


For some C' 4 > 0 (to be chosen later) let B n := 1/3 : ||/3 — Hnll < C' a A 2 \- Then, using a similar 


argument as in the proof of Lemma 3.2 of 
for some constant K. Hence, 


Klc iin and van der Vaartl ( 20121) . we get n (B n ) > KAp 


I fl ^-n(f3)d(3> f f[I^-n(f3)df3>KAP n e- nC '^. 

HPo JB n i=1 HPo 


( 20 ) 
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Choosing C' 4 = C 4 /2 , equations (fTUl) and (EU1) together imply 


E 


/A„<||/3-/3 0 ||< ei 


{Vn\\f3 - f3 0 \\) k dU n ((3) 


< Cq/K ■ (nA^) 2 e Ci 8 “ —> 0 (as n —> oo). (21) 


Equations (fI7|) . (Holl and (1?T1) together give the result 


E 


'||/3-/3 0 ||>A„ 


(v^||/3-/3 0 ll) fe rfn n (/3) 


□ 


Proof of Lemma 2 . Suppose for a vector L, let 6 := L r /3, 0 O := L T /3 0 , 0^ := L T /3(^ and 
0 n := L T /3 n . By abuse of notation, we will continue to use n„() to denote the posterior distribution 
of 6 from Step 1. Since 9 is just a linear combination of (3, Lemma 1 implies (for k € {0,1, 2}) 



(V^\e - e 0 \) k dii n {9) 


0 in probability [P]. 


( 22 ) 


Note that 




n|0 - O dn n (0) 


< 2 


M n 


/{0:\At|0-0 o |>^} 


\e - 9 0 \ 2 dU n (0) + 2n|C - 6»o| 2 - U n (V^\ 0 - 001 > (23) 


Using equation E21) and the fact n|0^ — 0o | 2 is bounded in probability, we get that the right hand 
side of equation (1231) converges to zero in probability. Hence, 


Further, 



n\9 — 9^ \ 2 dH n (9) -> 


0 in probability [P] 


(24) 


/{< 9 :v /^| 0 - 0 "|>Mn} n \® ^n f | 2 ^n n (0) < /{g ;v ^| e _ eo |>J^L} n\9 9^\ 2 dU n (6) 

+ /{6l:V^|e-fo|<-^, Vn\9 — 0M\>M n } ~ | 2 ^n n (0) (25) 

First term on right hand side of (1251) is same as that in equation (l24l) . Since v / n\9 — 0o| < and 
y/n\9 — 6^f | > M n would imply \fn\0^ — 0 O | > 3 -, for any e > 0, for the second term, we can write 


15 







p 


M n 


\e - e^\ 2 du n (e) >e\< p(V^K - &o\ > -£-) 


Since the right hand side goes to zero as n —> oo, the second term of equation (12511 converges to zero 
in probability. Therefore, we have established that 


f{8:^\e-eM\ >Mn } n \° ~ e n?dRn(0) 0 in probability [ P} 


(26) 


Now, let Q n denote the distribution of y/n{9 — 9^) under 9 ~ n„. Let Q denote N( 0,L T V 1 L). 
Then Theorem 1 implies that Q n converges to Q with respect to the total variation norm. Therefore, 


for any fixed M, / x 2 dQ n —> / x 2 dQ as n —> oo, and / x 2 dQ —> / x 2 dQ as M —> oo. 

J\x\<M J\x\<M J\x\<M J 

So, there exists a sequence {M n } such that Jj , <M x 2 dQ n — > f x 2 dQ as n —> oo. For such a 
sequence, using equation l26l we get f x 2 dQ n —> f x 2 dQ as n —> oo. Equivalently, 


J n(9 - 9^) 2 dU n LV _1 L as n -> oo. 


(27) 


Following the same approach as above, we can also conclude that: 


J \n{9 — 9^)\dR n —> J \x\dQ(x) as n —> oo. 
J (■.n(9 — 9^)) + dH n —> J x + dQ(x) as n —> oo. 


It is now easy to see that equations (l28l) and (l29l) imply that 


y/n(L T (3 n — L r /3„) — > 0 in probability [P] 


This along with equation (l27l) implies that 


n(9 — 9 n ) z dH n —► LV as n —> oo 


(28) 

(29) 


Since the above two results hold for any vector L, statements (a) and (b) of Lemma 2 follow. □ 
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Proof of Lemma 3 . Recall Z* = Yi — X.f(3 0 . Let Pj(-) d enote the cumulative distribution function 
for the p.d.f Pi(-). In the lines of proof of Theorem 4.1 in iKoenkerl (l2005h . we write 


f( n ) 

U n {8) := log = U ln (8) + E[U 2n } + (U a „(8) - E[U 2n (8)}) 




Po 


2=1 ' 


1 U 

where, U ln (8) := -8 r —= V'(r - and 

Vn “ 1 

2=1 

X T 6 

n «_ 2 _ n 

U 2n (S) := ^ j " {I{Zj<s} — I{Zi<0\)ds = U2ni{8). 

Further, by Taylor’s formula, 

n .^±1 

E(U 2n (8)) =J2 I (^( x f/3o + s)~ Pi(XT/3 Q ))ds = I V " Pi( x f/3o + r in (s)) ■ a da, 


2=1 


X i 6 


JO 


2=1 


2 = 1 


where 0 < r* n (s) < s < 


X/ <5 


y/n 


. Therefore, by Assumptions 2 and 5, we have 


1 

P(P 2 nW) “ 

1 2 = 1 

<pJ^C |r in (s)r • s ds < C (max ^1) " ■ £ X.Xf 5. 

It follows that, 


sup 

S<EK 


E(U 2n (S )) 


1 

2 n 


Y,Pi(XfPo)S T XiX?8 

2=1 


—>• 0 in probability [P]. 


(30) 


To complete the proof, it is enough to show sup 56if \U 2n (S) — E [C/ 2 n(d)] | —► 0 in probability [P]. To 
show this, let Si = argmaxg^K \U 2n i(S) — E [U 2n i(S)]\. Si is possibly random since it can depend on 
Zi, but is well defined since U 2n i(S) is a continuous (random) function on a compact set K. Then, 


P ( sup | U 2n (S) - E [t/ 2 n(<5)]| > d < P ( Y] sup \U 2ni (S) -E[U ani (S)]\ > e ] 

\S£K J \t~l S ^ K J 

= P \U 2ni {8i) - E [U 2n i(8i)]\ >e\<J2 E ^ 2n ^ Sl) \ (31) 

\i=l / 2=1 6 


The last step uses Chebyhev’s inequality. Further, let G = sup^g^ sup i>:1 |Xfd|. In particular, 
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Xf<5j| < G. Without loss of generality, if X.J 6 >0, Vi, the right hand side of Equation (1311) is 


^E 


E ( Jo ^ ( 1 {z i <s} - I{Zi<0})ds 


^E 


E (I{Zi<s} ~ I{Zi<o})ds ^ 


^E 


^ G 2 E ( T {Zi<^} - T {Zi<o}) ds G 2 


ri2 n / fj 

< ~2 E + -7=>- PiXfPo) 

i =1 ' v 


(j2 71 ^ Q Q 

< — o y I Pi (Xf (3 0 + Tin) I for some \r in \ < —= using Taylor’s formula. 
ne z ^ 1 1 yn yn 


A consequence of assumptions 2 and 5 is that {pi(Xf/3 0 + rj„), i > 1} are uniformly bounded. 
Hence, we can conclude that P (sup 5eK |f/2„(<5) — E \U 2 n{d)}\ > e) —> 0. □ 
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